Preston
Polyp Segmentation Generalisability of Pretrained Backbones
Sanderson, Edward, Matuszewski, Bogdan J.
It has recently been demonstrated that pretraining backbones in a self-supervised manner generally provides better fine-tuned polyp segmentation performance, and that models with ViT-B backbones typically perform better than models with ResNet50 backbones. In this paper, we extend this recent work to consider generalisability. I.e., we assess the performance of models on a different dataset to that used for fine-tuning, accounting for variation in network architecture and pretraining pipeline (algorithm and dataset). This reveals how well models with different pretrained backbones generalise to data of a somewhat different distribution to the training data, which will likely arise in deployment due to different cameras and demographics of patients, amongst other factors. We observe that the previous findings, regarding pretraining pipelines for polyp segmentation, hold true when considering generalisability. However, our results imply that models with ResNet50 backbones typically generalise better, despite being outperformed by models with ViT-B backbones in evaluation on the test set from the same dataset used for fine-tuning.
- Europe > United Kingdom > England > Lancashire > Preston (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Asia > South Korea > Daejeon > Daejeon (0.05)
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Sanderson, Edward, Matuszewski, Bogdan J.
Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained in a supervised manner with ImageNet-1k as backbones. However, the use of modern self-supervised pretraining algorithms and a recent dataset of 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study the fine-tuned performance of models with ResNet50 and ViT-B backbones pretrained in self-supervised and supervised manners with ImageNet-1k and Hyperkvasir-unlabelled (self-supervised only) in a range of GIE vision tasks. In addition to identifying the most suitable pretraining pipeline and backbone architecture for each task, out of those considered, our results suggest: that self-supervised pretraining generally produces more suitable backbones for GIE vision tasks than supervised pretraining; that self-supervised pretraining with ImageNet-1k is typically more suitable than pretraining with Hyperkvasir-unlabelled, with the notable exception of monocular depth estimation in colonoscopy; and that ViT-Bs are more suitable in polyp segmentation and monocular depth estimation in colonoscopy, ResNet50s are more suitable in polyp detection, and both architectures perform similarly in anatomical landmark recognition and pathological finding characterisation. We hope this work draws attention to the complexity of pretraining for GIE vision tasks, informs this development of more suitable approaches than the convention, and inspires further research on this topic to help advance this development. Code available: \underline{github.com/ESandML/SSL4GIE}
- Europe > United Kingdom > England > Lancashire > Preston (0.14)
- North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
FCB-SwinV2 Transformer for Polyp Segmentation
Fitzgerald, Kerr, Matuszewski, Bogdan
Polyp segmentation within colonoscopy video frames using deep learning models has the potential to automate the workflow of clinicians. This could help improve the early detection rate and characterization of polyps which could progress to colorectal cancer. Recent state-of-the-art deep learning polyp segmentation models have combined the outputs of Fully Convolutional Network architectures and Transformer Network architectures which work in parallel. In this paper we propose modifications to the current state-of-the-art polyp segmentation model FCBFormer. The transformer architecture of the FCBFormer is replaced with a SwinV2 Transformer-UNET and minor changes to the Fully Convolutional Network architecture are made to create the FCB-SwinV2 Transformer. The performance of the FCB-SwinV2 Transformer is evaluated on the popular colonoscopy segmentation bench-marking datasets Kvasir-SEG and CVC-ClinicDB. Generalizability tests are also conducted. The FCB-SwinV2 Transformer is able to consistently achieve higher mDice scores across all tests conducted and therefore represents new state-of-the-art performance. Issues found with how colonoscopy segmentation model performance is evaluated within literature are also re-ported and discussed. One of the most important issues identified is that when evaluating performance on the CVC-ClinicDB dataset it would be preferable to ensure no data leakage from video sequences occurs during the training/validation/test data partition.
- Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.96)
- Health & Medicine > Therapeutic Area > Gastroenterology (0.96)
FCN-Transformer Feature Fusion for Polyp Segmentation
Sanderson, Edward, Matuszewski, Bogdan J.
Colonoscopy is widely recognised as the gold standard procedure for the early detection of colorectal cancer (CRC). Segmentation is valuable for two significant clinical applications, namely lesion detection and classification, providing means to improve accuracy and robustness. The manual segmentation of polyps in colonoscopy images is time-consuming. As a result, the use of deep learning (DL) for automation of polyp segmentation has become important. However, DL-based solutions can be vulnerable to overfitting and the resulting inability to generalise to images captured by different colonoscopes. Recent transformer-based architectures for semantic segmentation both achieve higher performance and generalise better than alternatives, however typically predict a segmentation map of $\frac{h}{4}\times\frac{w}{4}$ spatial dimensions for a $h\times w$ input image. To this end, we propose a new architecture for full-size segmentation which leverages the strengths of a transformer in extracting the most important features for segmentation in a primary branch, while compensating for its limitations in full-size prediction with a secondary fully convolutional branch. The resulting features from both branches are then fused for final prediction of a $h\times w$ segmentation map. We demonstrate our method's state-of-the-art performance with respect to the mDice, mIoU, mPrecision, and mRecall metrics, on both the Kvasir-SEG and CVC-ClinicDB dataset benchmarks. Additionally, we train the model on each of these datasets and evaluate on the other to demonstrate its superior generalisation performance.
- South America > Peru > Loreto Department (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Lancashire > Preston (0.04)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.90)